Online Grammar Compression for Frequent Pattern Discovery

نویسندگان

Shouhei Fukunaga

Yoshimasa Takabatake

Tomohiro I

Hiroshi Sakamoto

چکیده

Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω( 1 lgm ) to Ω( 1 lg∗N lgm ) where m is the length of an optimal pattern in a string of length N and lg∗ is the iteration of the logarithm base 2. For a sufficiently large N , lg∗N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series

The problem of frequent and anomalous patterns discovery in time series has received a lot of attention in the past decade. Addressing the common limitation of existing techniques, which require a pattern length to be known in advance, we recently proposed grammar-based algorithms for efficient discovery of variable length frequent and rare patterns. In this paper we present GrammarViz 2.0, an ...

متن کامل

An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams

A data stream is a continuous, huge, fast changing, rapid, infinite sequence of data elements. The nature of streaming data makes it essential to use online algorithms which require only one scan over the data for knowledge discovery. In this paper, we propose a new single-pass algorithm, called DSMFI (Data Stream Mining for Frequent Itemsets), to mine all frequent itemsets over the entire hist...

متن کامل

Grammar Compression: Grammatical Inference by Compression and Its Application to Real Data

A grammatical inference algorithm tries to find as a small grammar as possible representing a potentially infinite sequence of strings. Here, let us consider a simple restriction: the input is a finite sequence or it might be a singleton set. Then the restricted problem is called the grammar compression to find the smallest CFG generating just the input. In the last decade many researchers have...

متن کامل

Sequitur-based Inference and Analysis Framework for Malicious System Behavior

Targeted attacks on IT systems are a rising threat against the confidentiality of sensitive data and the availability of critical systems. With the emergence of Advanced Persistent Threats (APTs), it has become more important than ever to fully understand the particulars of such attacks. Grammar inference offers a powerful foundation for the automated extraction of behavioral patterns from sequ...

متن کامل

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Online Grammar Compression for Frequent Pattern Discovery

نویسندگان

چکیده

منابع مشابه

GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series

An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams

Grammar Compression: Grammatical Inference by Compression and Its Application to Real Data

Sequitur-based Inference and Analysis Framework for Malicious System Behavior

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

عنوان ژورنال:

اشتراک گذاری